import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go
import plotly.io as pio
import plotly.subplots as sp
from statsmodels.tsa.seasonal import seasonal_decompose, STL # used for decomposition
pio.renderers.default = "notebook" # set the default plotly renderer to "notebook" (necessary for quarto to render the plots)11 Time Series Decomposition
This section is still under construction and will be completed in the near future. Please do not go beyond this point for now.
Time series data often exhibit patterns such as trends, seasonality, and cycles. Decomposing a time series into its components can help to better understand the underlying processes.
Time series are usually decomposed into three components: - Trend: \(T_t\), long-term progression of the series (e.g., increasing or decreasing). - Seasonality: \(S_t\), regular patterns that repeat over a fixed period (e.g., daily, weekly, yearly). - Residual: \(I_t\), random noise or irregular component.
Additive models assume that the components add together to form the time series: \[ Y_t = T_t + S_t + I_t \]
Multiplicative models assume that the components multiply together to form the time series: \[ Y_t = T_t \times S_t \times I_t \]
Sometimes, also a fourth component, Cyclic (\(C_t\)), is considered, which represents long-term oscillations that are not of fixed period.
11.1 Examples
Imports
This example uses the package statsmodels for time series decomposition.
Generating Example Data
np.random.seed(42) # for reproducibility
date_range = pd.date_range(start="2010-01-01", end="2024-12-31", freq="MS")
trend = 100 + 0.9 * np.arange(len(date_range)) # nonlinear trend
seasonality = 121 * np.sin(np.pi * (date_range.month / 12)) # yearly seasonality
noise = np.random.normal(0, 21, len(date_range)) # random noise at each point
# Add outliers with a probability of 0.05
outlier_mask = np.random.rand(len(date_range)) < 0.05
outlier_values = np.random.normal(10, 100, len(date_range)) # large outliers
noise[outlier_mask] += outlier_values[outlier_mask]
sales = trend + seasonality + noise # final data as sum of all components
df = pd.DataFrame({"Date": date_range, "Sales": sales}).set_index("Date")In the above plot, we can see a clear upward trend, but identifying seasonality is more difficult.
To support this visual analysis, we add vertical lines at the beginning of each year.
Now it is apparent that there is a yearly seasonal pattern, with peaks around mid-year.
Algorithmic decomposition of a time series into its components is not a trivial task. Many algorithms exist to achieve this, each with its own advantages and disadvantages.
In the following, we use a very simple additive model based on moving averages to decompose our time series. First, the trend is estimated using a moving average with a smaller window size. Then, the detrended series is calculated by subtracting the trend from the original series. The seasonal component is estimated by averaging the values for each season (e.g., each month, year, ..) in the detrended time series. Finally, the residual component is calculated by subtracting both the trend and seasonal components from the original series.
Note that this is a very naive approach and according to the documentation of seasonal_decompose a more sophisticated method should be preferred. However, it is easy to understand and serves well for demonstration purposes.
Here is a helper function to create a plotly figure with four plots.
def seasonal_decompose_plotly(decomposed_ts, title):
"""
Plots a time series decomposition using Plotly.
Parameters:
decomposed_ts (DecomposeResult): Decomposed time series result.
Returns:
plotly object
"""
# Extract components
observed = decomposed_ts.observed
trend = decomposed_ts.trend
seasonal = decomposed_ts.seasonal
resid = decomposed_ts.resid
# Create subplots
fig_decomp = sp.make_subplots(rows=4, cols=1, shared_xaxes=True,
subplot_titles=["Observed", "Trend", "Seasonal", "Residual"])
fig_decomp.add_trace(go.Scatter(x=observed.index, y=observed.values.flatten(), name="Observed"), row=1, col=1)
fig_decomp.add_trace(go.Scatter(x=trend.index, y=trend, name="Trend"), row=2, col=1)
fig_decomp.add_trace(go.Scatter(x=seasonal.index, y=seasonal, name="Seasonal"), row=3, col=1)
fig_decomp.add_trace(go.Scatter(x=resid.index, y=resid, name="Residual"), row=4, col=1)
fig_decomp.update_layout(height=900, title_text=title)
return fig_decompPlease note the different \(y\)-axes scales in the following plot.
A more robust method for time series decomposition is STL (Seasonal and Trend decomposition using Loess).
It not only allows for more flexibility in the decomposition process (e.g. look at seasonal changes over time), but is also more robust to outliers.
Having the decomposed components at hand, opens a variety of applications.
11.1.1 Denoising
Using the trend and seasonal components from the STL decomposition, we can create a denoised version of the original time series.
In previous plot, we can see that the estimated values follow the observed values quite closely, indicating that the STL decomposition has effectively captured the underlying patterns in the data. But we can also see some points with larger deviations, which are likely outliers in the original data.
11.1.2 Anomaly Detection
One way to quantify outliers is to apply a threshold to the residuals (e.g., 3 times the standard deviation of the residuals, also known as the 3-sigma rule).
resid = decomp_stl.resid
std_resid = resid.std()
fig = go.Figure()
fig.add_trace(go.Scatter(x=resid.index, y=resid, mode='lines', name='Residual'))
# Add horizontal lines for ±3 std. dev.
fig.add_hline(y=resid.mean() + 3 * std_resid, line_dash="dash", line_color="blue", opacity=0.7, name="+3 Std. Dev.")
fig.add_hline(y=resid.mean() - 3 * std_resid, line_dash="dash", line_color="blue", opacity=0.7, name="-3 Std. Dev.")
# Fill area between the lines
fig.add_traces([
go.Scatter(
x=resid.index,
y=[resid.mean() + 3 * std_resid] * len(resid),
mode='lines',
line=dict(width=0),
showlegend=False,
hoverinfo='skip'
),
go.Scatter(
x=resid.index,
y=[resid.mean() - 3 * std_resid] * len(resid),
mode='lines',
fill='tonexty',
fillcolor='rgba(0,0,255,0.08)',
line=dict(width=0),
showlegend=False,
hoverinfo='skip'
)
])
fig.update_layout(title="Residual with ±3 Std. Dev. Area", yaxis_title="Residual")
fig.show()So the outliers are:
| Sales | |
|---|---|
| Date | |
| 2014-09-01 | 447.316646 |
| 2016-07-01 | 139.561235 |
| 2018-03-01 | 489.942096 |
| 2021-06-01 | 227.748919 |
11.1.3 Detrending
Using the decomposed components also allows for detrending the time series, which can be useful for further analysis.
11.1.4 Deseasonalizing
Another important application is to deseasonalize the time series, which is often a prerequisite for forecasting models.
